12 research outputs found

    The Universe at Extreme Scale: Multi-Petaflop Sky Simulation on the BG/Q

    Full text link
    Remarkable observational advances have established a compelling cross-validated model of the Universe. Yet, two key pillars of this model -- dark matter and dark energy -- remain mysterious. Sky surveys that map billions of galaxies to explore the `Dark Universe', demand a corresponding extreme-scale simulation capability; the HACC (Hybrid/Hardware Accelerated Cosmology Code) framework has been designed to deliver this level of performance now, and into the future. With its novel algorithmic structure, HACC allows flexible tuning across diverse architectures, including accelerated and multi-core systems. On the IBM BG/Q, HACC attains unprecedented scalable performance -- currently 13.94 PFlops at 69.2% of peak and 90% parallel efficiency on 1,572,864 cores with an equal number of MPI ranks, and a concurrency of 6.3 million. This level of performance was achieved at extreme problem sizes, including a benchmark run with more than 3.6 trillion particles, significantly larger than any cosmological simulation yet performed.Comment: 11 pages, 11 figures, final version of paper for talk presented at SC1

    Scalable Reactive Molecular Dynamics Simulations for Computational Synthesis

    Get PDF
    Reactive molecular dynamics (MD) simulation is a powerful research tool for describing chemical reactions. We eliminate the speed-limiting charge iteration in MD with a novel extended-Lagrangian scheme. The extended-Lagrangian reactive MD (XRMD) code drastically improves energy conservation while substantially reducing time-to-solution. Furthermore, we introduce a new polarizable charge equilibration (PQEq) model to accurately predict atomic charges and polarization. The XRMD code based on hybrid message passing+multithreading achieves a weak-scaling parallel efficiency of 0.977 on 786 432 IBM Blue Gene/Q cores for a 67.6 billion-atom system. The performance is portable to the second-generation Intel Xeon Phi, Knights Landing. Blue Gene/Q simulations for the computational synthesis of materials via novel exfoliation mechanisms for synthesizing atomically thin transition metal dichalcogenide layers will dominate nanomaterials science in this century

    Improving GPU Performance Prediction with Data Transfer Modeling

    No full text
    Abstract—Accelerators such as graphics processors (GPUs) have become increasingly popular for high performance scientific computing. Often, much effort is invested in creating and optimizing GPU code without any guaranteed performance benefit. To reduce this risk, performance models can be used to project a kernel’s GPU performance potential before it is ported. However, raw GPU execution time is not the only consideration. The overhead of transferring data between the CPU and the GPU is also an important factor; for some applications, this overhead may even erase the performance benefits of GPU acceleration. To address this challenge, we propose a GPU performance modeling framework that predicts both kernel execution time and data transfer time. Our extensions to an existing GPU performance model include a data usage analyzer for a sequence of GPU kernels, to determine the amount of data that needs to be transferred, and a performance model of the PCIe bus, to determine how long the data transfer will take. We have tested our framework using a set of applications running on a production machine at Argonne National Laboratory. On average, our model predicts the data transfer overhead with an error of only 8%, and the inclusion of data transfer time reduces the error in the predicted GPU speedup from 255 % to 9%. I
    corecore